feat(plugins): Pianola manager-agent + plugin system + MCP tool bridge (beta)#1139
Draft
jSydorowicz21 wants to merge 75 commits into
Draft
feat(plugins): Pianola manager-agent + plugin system + MCP tool bridge (beta)#1139jSydorowicz21 wants to merge 75 commits into
jSydorowicz21 wants to merge 75 commits into
Conversation
Pianola is a new Encore-gated autonomous manager agent that watches agent tabs, detects when an agent is awaiting the user, classifies the ask and its risk, and either auto-answers low-risk prompts from the user's rules or escalates uncertain/high-risk ones. Foundation (steps 0-1 of Plans/pianola-implementation-plan.md): - Encore flag `pianola`, off by default, end-to-end: EncoreFeatureFlags type, DEFAULT_ENCORE_FEATURES and settingsMetadata defaults, CLI encore FEATURES + aliases (auto-pilot/autopilot/pilot/manager), and a Settings -> Encore toggle. - Shared contracts in src/shared/pianola/types.ts. - Pure, I/O-free classifier: prefers a structured AwaitingInputSignal, falls back to conservative heuristics, rates risk via a focused lexicon. - Pure, safety-first policy engine: high-risk always escalates; auto-answer only on an explicit matching rule; default is escalate. - 31 classifier/policy unit tests, plus CLI alias/default coverage. Nothing consumes the flag yet, so the feature is fully inert when off. Plans/ captures the verified codebase investigation and the build plan.
Addresses all findings from the codex review of the foundation: - CRITICAL: high-risk prompts now always escalate before any rule action, so a broad `ignore` rule can no longer suppress the most important alerts. Added a regression test. - Risk lexicon extracted into pianola-risk.ts with word-boundary regexes instead of raw substrings, fixing false positives (`auth` in `author`, `token` in `tokenizer`) and adding outward-facing/destructive coverage (push/publish/ release/merge PR/deploy/email/.env/private key/payment). Boundary tests added. - Empty-match `auto_answer` rules now escalate (an auto-answer rule must narrow its scope), preventing an accidental catch-all from auto-answering everything. - Numbered-choice detection (1) .. 2) ..) implemented to match the documented behavior; trailing-question-mark check tightened to end-of-message only. - Project/tab rule scoping normalizes path separators and casing for cross-platform robustness. - PianolaDecision is now a discriminated union so `answer` only exists on an auto_answer decision. - Added a compile-time drift guard tying PianolaMessage to the web-server SessionHistoryMessage contract. Typecheck clean; 58 Pianola/encore tests pass.
Step 2 of the Pianola plan, realized as a pure detector module rather than parser hot-path surgery (more maintainable; no parser/IPC/WebSocket contract changes, and the watcher derives the signal from session-show output it already has). - pianola-awaiting-detector.ts: detectAwaitingInput(content) returns a typed AwaitingInputSignal (precedence plan_review > permission > choice > question) with extracted options (numbered and slash/bracket forms); enrichWithAwaiting Input(messages) fills it onto assistant turns immutably before classification. - The classifier already treats a present signal as authoritative (high confidence), so structured prompts now upgrade from heuristic to structured. - Tests cover each kind, option extraction, null cases, immutability, and the detector+classifier integration. Typecheck clean; 64 Pianola tests pass.
Steps 3-4 of the plan; Pianola is now runnable end-to-end from the CLI. Layering: moved the pure brain (classifier, policy, risk, awaiting-detector) from src/main/pianola to src/shared/pianola, since both the CLI watcher and the future in-app engine consume it. Tests moved alongside. Storage: - shared/pianola/storage.ts: filenames, the PianolaDecisionRecord type, and a pure rule validator (drops malformed rules so a bad hand-edit cannot break the engine). - cli/services/pianola-store.ts: reads the rules file and appends to the JSON Lines decision audit log in the Maestro config dir (shared with the desktop, no native deps). Watcher + CLI: - shared/pianola/pianola-watcher.ts: runWatchIteration ties enrich -> classify -> decide -> dispatch/record with all I/O injected, so it is fully unit tested and reusable by a future engine. Dedups by last-handled assistant message id. - cli/commands/pianola.ts: `pianola watch <tab>` polls get_session_history and acts per rules (--dry-run, --interval, --once, --agent); `pianola rules` and `pianola log` are read views. Every command hard-gates on the pianola Encore flag. Typecheck clean across all configs; 108 Pianola/encore tests pass.
Addresses the runnable-MVP review findings: - HIGH audit-before-dispatch: the watcher records the decision BEFORE sending a message (intent record), then appends the dispatch outcome under the same id. A pre-dispatch audit-write failure now fails closed (no message sent). - HIGH bounded retry: a failed dispatch no longer advances the dedup cursor, so the prompt is retried up to MAX_DISPATCH_ATTEMPTS, then Pianola gives up instead of either looping forever or skipping it forever. - HIGH injection hardening: option extraction is strict (clean short tokens; no file paths, markdown links, version numbers) and a `choice` requires real question/choice context, so prose cannot be shaped into an auto-answerable choice. The classifier reuses the same option/asking helpers (one source). - MEDIUM decision records are validated and folded by id on read, so a malformed JSONL line cannot crash `pianola log`. - MEDIUM projectPath is threaded through the get_session_history result so project-scoped rules work in the CLI watcher. - MEDIUM the watch loop now contains iteration errors (logs and continues). - LOW a malformed rules file surfaces a one-line warning instead of silently disabling all rules. - Added watcher safety tests (audit ordering, retry/give-up, fail-closed), detector false-positive fixtures, store folding/validation tests, and CLI watch tests (mocked client + dispatch). Typecheck clean across all configs; 123 Pianola/encore tests pass.
…ormed-rules guard)
Adds the single pinned Pianola agent at the top of the Left Bar, gated by the pianola Encore flag. Pianola is a real claude-code-backed agent so it chats through the existing agent system. It is excluded from normal categories and cannot be renamed, duplicated, bookmarked, moved to a group, or deleted. - types: isPianola flag on Session - usePianolaAgent: one-time creation after sessions hydrate, flag-gated - useSessionCategories: exclude Pianola from all categories - SessionList: pinned Pianola section above Starred - SessionContextMenu: guard mutating actions for Pianola
…(v2 L2-4) Turns the pinned Pianola agent into Maestro's orchestrator using the existing, tested CLI surface (the chosen action layer over MCP). - CLI: new `maestro-cli pianola add-rule` so Pianola turns a conversation into a durable watcher rule; backed by writePianolaRules (validated, atomic) in the CLI store. - Prompt: src/prompts/pianola-system.md (identity, exact CLI invocations, task-dump orchestration, and the Hybrid confirmation discipline - auto for read/observe/watch, confirm before spawning or production sends). Registered in promptDefinitions (CORE_PROMPTS, PROMPT_IDS, quick actions). - Spawn: prepareMaestroSystemPrompt appends the Pianola prompt for the isPianola agent; process.ts injects MAESTRO_CLI_JS + MAESTRO_AGENT_ID env so Pianola's Bash can reach the CLI with no PATH assumptions. Reuses the existing maestro-cli path resolver (now exported).
Phase 1 of learning-from-history, plus the earlier fixes for the broken MAESTRO_CLI_JS path and the list-agents command name. - shared/pianola/transcript-mining.ts (pure, 23 unit tests): per-format line parsers (Claude Code + Codex), decision-pair extraction reusing enrichWithAwaitingInput + classifyMessages, reply-polarity heuristic, aggregation. - cli pianola learn: crawls ~/.claude/projects + ~/.codex/sessions into a labeled decision corpus (JSON or --out file). Verified: 154 pairs from real history. - logger: diagnostic notices to stderr so they no longer corrupt CLI --json output. - cue-cli-executor: add correct ../../cli candidate so MAESTRO_CLI_JS resolves in the preserved dev layout (also fixes Cue's latent dev path). - pianola-system prompt: use 'list agents' (real subcommand), add self-correction guidance to run --help and not substitute a stale CLI.
Three fixes to the transcript miner so Phase 2 gets a cleaner, fuller corpus: 1. Scoping flags on pianola learn: --since <date> (filter by file mtime), --project <substr> and --exclude <substr> (filter by session cwd). Lets the user target representative history and drop noise. 2. Drop topic-based aggregation (the classifier's topic is a per-ask snippet, not a stable category - it produced misleading topTopics). Replaced with a risk x polarity cross-tab, which is genuinely meaningful. 3. Widen mining recall: trigger extraction on the classifier itself (structured signal OR heuristic asks) instead of double-gating on the strict structured detector. Captures prose-style asks the detector missed. Live watcher is untouched. Result on real data: 154 -> 287 pairs, 20 -> 29 sessions. 24 unit tests (added a recall test for heuristic prose asks).
Add learned decision profiles so Pianola can recall how the user decides in a given project and judge low/medium-risk agent asks the way they would, escalating high-risk regardless. - storage contract: PianolaProfileEntry/PianolaProfiles types, validatePianolaProfileEntry/validatePianolaProfiles, resolveProfile (project profile, else global fallback, else none), with a PIANOLA_PROFILE_MAX_CHARS guard on profile size - CLI store: read/write/get/set profile helpers with atomic writes - CLI commands: pianola profile (read) and pianola set-profile (write from --file or piped stdin, optional --pair-count) - pianola-system prompt: learning/onboarding flow describing how to crawl history, synthesize a profile, get user sign-off, and recall it at decision time - tests: profile entry/collection validators and resolveProfile
When a watched agent hits an ask that no rule covers and that is not high risk, the watcher now hands the decision to Pianola (an LLM agent) to judge against the user's learned per-project profile instead of escalating straight to the user. High risk always escalates; dry runs never hand off; and the watcher stays purely rule-driven when no Pianola agent is available to hand off to. - watcher: optional resolveProfile + requestJudgment deps; a handoff branch that audits intent before the side effect (mirroring the auto-answer invariant), records the escalation, and marks the prompt handled so it is not re-handed-off - CLI: wires the handoff deps only when MAESTRO_AGENT_ID is set, builds a structured handoff prompt (waiting agent, ask, profile, how to answer or escalate), and guards against handing back to Pianola itself - prompt: tells Pianola what a decision handoff is and how to act on it, and to offer a rule when it keeps making the same call - tests: handoff fires/skips correctly across risk, profile presence, rule coverage, dry-run, and missing deps; audit-before-side-effect
The Pianola section drew a bold uppercase category header plus an accent-bordered box around the single manager agent (using the ungrouped variant, which sits flush with no horizontal margin), so one pinned agent read like a whole category and looked misaligned next to Starred/Ungrouped. Render it instead as a single flat row (normal row styling) with a pin marker and a divider below, so it reads as "the manager, pinned" rather than a section. The pin marker is shown for any isPianola session.
Give Pianola two pinned, non-closable views in its workspace - a
Dashboard and its Chat - in place of the normal file/terminal/browser
tab bar (Pianola is a manager surface, not a coding workspace). The
Dashboard is the default view and badges a live count of agents waiting
on the user.
The dashboard combines two live signals - desktop session states
(busy / waiting_input / idle) and Pianola's decision audit log
(escalations, handoffs, auto-answers) - into four sections: needs your
input, working now, recently done, and recent decisions. Rows jump to
the owning agent on click. The decision data flows through the existing
pianola.getDecisions IPC; the bucket derivation is a pure, tested
function.
- uiStore: pianolaView ('chat' | 'dashboard') + setter, default dashboard
- PianolaDashboard: view, data hook (live + polled), pure deriveDashboard
- PianolaWorkspaceTabs: the pinned Dashboard|Chat strip
- MainPanel: render the strip + swap content for Pianola
- tests: deriveDashboard bucket mapping
Captures the multi-agent audit: Pianola is a strong supervisor but not an orchestrator (4/10), Maestro is not plugin-ready (3/10), the Sprint 0 safety bundle, Track A orchestrator spine, and the tiered Track B plugin system roadmap, plus the resolved scope (full orchestrator, desktop daemon, full plugin SDK).
Closes four holes that undermined the human-in-the-loop premise of the autonomous watcher: - Durable watch-state rehydration: rehydrateWatchState(records, tabId) folds the audit log to seed lastHandledMessageId before the poll loop, so a restarted watcher no longer re-answers a prompt it already handled (the prior code recreated state fresh every start). - Watcher self-stop on Encore revoke: re-read encoreFeatures.pianola each poll and break, so toggling Pianola off in Settings actually halts in-flight autonomous answering instead of running until killed. - Proactive escalation notifications: optional deps.notify on the pure watcher; the CLI fires a notify_toast (jump-session click, sourceAgent Pianola, sticky+red for high-risk) on escalate, handoff failure, and handoff timeout, so blocking asks reach the user, not just a badge. - Handoff-failure fallback + timeout: a failed handoff no longer drops the ask; it falls back to an audited user escalation + notify. A successful handoff is tracked as pendingHandoff and, if Pianola never answers within HANDOFF_TIMEOUT_POLLS, re-escalates to the user. Watcher suite: 27 tests pass. CLI + main typecheck clean; CLI rebuilt.
The pure foundation the multi-agent coordinator consumes. Nothing in Pianola previously represented a task, a dependency, or an ordering, so a task-dump fired everything at once against incomplete upstream state. - pianola-tasks.ts (pure, immutable): PianolaTask/PianolaPlan, findPlanCycle (DFS, reports the cycle), validatePlan (shape + unknown/ self/duplicate deps + cycle detection -> errors, null plan on fatal), computeReadyTasks (pending with all deps done), markTaskStatus, propagateBlocked (fixed-point cascade from failed/skipped upstream), planProgress. - storage.ts: PIANOLA_PLANS_FILENAME + PianolaPlansFile + validatePianolaPlansFile (drops malformed plans). - CLI + desktop stores: read/write/get/upsert plans (atomic temp+rename). 37 tests pass; cli + main typecheck clean.
The pure trigger the orchestrator uses to advance the DAG: detects when
a dispatched task finished or failed so dependents can start and botched
tasks are noticed. Previously the classifier only emitted question|
blocked|none, so there was no done/failed signal at all.
- pianola-completion-detector.ts (pure): detectTaskOutcome({previousState,
currentState, recentMessages}) -> done|failed|working with a reason.
done = working-state -> idle transition with no failure marker; failed
= error state or a failure marker in the latest message; waiting_input
stays working (the watcher owns the ask). Failure lexicon ported from
parsers/error-patterns.ts; success heuristic mirrors cue-completion.
- hasFailureMarker + FAILURE_MARKER_PATTERNS exported for reuse.
24 tests pass; cli + main typecheck clean.
The coordinator that drives a task DAG to completion with concurrency
control. Sibling of the watcher: pure, dependency-injected, looped by a
CLI/daemon shell.
- pianola-orchestrator.ts: runOrchestratorIteration(state, deps,
{concurrencyLimit}) polls running tasks via detectTaskOutcome, marks
done/failed, propagateBlocked, then dispatches up to the concurrency
limit from computeReadyTasks (ensure/reuse agent -> dispatch -> mark
running). Failures leave tasks pending to retry without consuming a
slot. Persists the plan each iteration; done flips when all tasks are
terminal or blocked.
- Seeds prevStates='connecting' on dispatch so a fast task's first
idle poll resolves as done instead of hanging.
15 tests pass; cli + main typecheck clean.
The I/O shell that runs the pure orchestration engine end to end. - pianola plan set/list/show: author a task DAG as JSON (validated: cycles, unknown/self deps, and bad shape rejected), inspect plans and per-task status/progress. - pianola orchestrate <planId>: loops runOrchestratorIteration against the live desktop. Deps wired to real WS contracts - getRunState from list_desktop_sessions (2s memo), getRecentMessages from get_session_history, ensureAgent via create_session (reuse agentId or create by agentType), dispatch via runDispatch, persist via upsertPianolaPlan, task_failed toasts. --interval/--concurrency/--once. Mirrors the watcher loop: SIGINT, per-iteration Encore self-stop, finally-disconnect. - pianola-system.md: "Orchestrating a task DAG" section teaching Pianola to author a plan for interdependent tasks vs the dump-and-babysit flow for independent ones. cli + main typecheck clean; CLI builds; plan/orchestrate help verified.
Adds PianolaSupervisor, a main-process registry that owns Pianola's long-running watch and orchestrate processes as managed children: - Reconciles desired targets from the shared supervisor store and spawns supervised child CLI processes (pianola watch/orchestrate) via child_process, replacing orphaned nohup backgrounding. - Bounded-backoff restart on crash, health tracking, and fs.watch-driven reconcile so adds/removes/enable/disable take effect live. - Encore-gated on pianola; relaunches active targets on app start and tears them down on shutdown (lifecycle wired in main/index.ts). - Shared supervisor store (CLI + main read/write the same files): PIANOLA_SUPERVISOR_FILENAME + PianolaSupervisedTarget + validatePianolaSupervisorFile in shared storage; upsert/remove/list in both pianola-store (CLI) and pianola-store-main (desktop). - IPC: supervisor list/add/set-enabled/remove handlers + preload namespace + global.d.ts typings. - CLI: pianola supervise watch|orchestrate|list|remove|enable|disable. - Prompt: prefer `pianola supervise` over nohup for watch/orchestrate. - 9 supervisor-storage tests.
Stands up the permanent, semver-managed contract every later plugin tier builds on. Inert by default: gated behind a new `plugins` Encore flag, list-only, and no plugin code executes yet. Pure, bundle-safe contract (src/shared/plugins): - host-api.ts: HOST_API_VERSION + isHostApiCompatible (same-major, host >= declared minHostApi; strict on malformed). - plugin-manifest.ts: the frozen plugin.json shape + hand-rolled validator (matches the pianola storage convention; no new dep), with a strict plugin-id pattern. Host incompatibility is surfaced as a status, not a validation error, so incompatible plugins stay visible. - plugin-registry.ts: pure, immutable registry (buildRecord, upsert, remove, setEnabled refusing non-ok plugins, listActive, toEnableState). - storage.ts: versioned plugin state file + a generic forward migration runner (electron-store has no schema versioning) seeded with a v0 bare-boolean -> v1 migration. Main process (src/main/plugins): - plugin-store-main.ts: atomic temp+rename writes, ENOENT-as-empty, validation at the persistence boundary, safe-folder-name guard. - plugin-manager.ts: discover plugin dirs, validate manifests, derive the registry, persist enable toggles, install (copy dir by manifest id) and uninstall (dir removal guarded to the plugins dir). Self-gates on the Encore flag. Wiring: - IPC plugins:list/set-enabled/install/uninstall (handler-level Encore gate that throws PluginsDisabled, mirroring pianola). - window.maestro.plugins preload API + global.d.ts typings. - main/index.ts constructs the manager, refreshes on boot, broadcasts plugins:changed; registered in both production and the aggregate registrar. - `plugins` Encore flag added to the type, both default literals, the settings metadata, an EncoreTab toggle card, and searchableSettings. 34 new tests (host-api, manifest, registry, storage/migrations).
…Phase 1) Makes plugins real (data-only): a plugin can declare themes, prompts, settings, and command macros, and the user can install/manage them. - contributions.ts (pure, bundle-safe): typed Tier 0 contribution shapes + per-type validators that drop bad items with a reason rather than failing the whole plugin. Ids are namespaced `<pluginId>/<localId>` so two plugins never collide and every contribution traces to its source. collectContributions (one plugin) and aggregateContributions (across active plugins, with per-plugin error buckets and duplicate-id guard). - PluginManager.getContributions() aggregates contributions from active, loadable plugins; empty when the Encore flag is off. - IPC plugins:contributions (the read seam host registries will consume plugin data from) + window.maestro.plugins.contributions() + typings. - PluginsPanel: Settings -> Encore -> Plugins management UI. Lists installed plugins with status badges (enabled/disabled/invalid/ incompatible), enable/disable toggle, install-from-folder (folder picker), and uninstall, with toast feedback. Rendered inside the Plugins Encore card when the flag is on. 7 new contribution tests.
… (Track B Phase 3) Lets a plugin run code, confined and capability-scoped. Security model: signature trust + explicit install-time consent gate WHICH code runs; a default-deny permission broker gates WHAT it can do; a utilityProcess + vm context limit blast radius. Still tier-gated and Encore-gated; the highest-risk host methods (agents.dispatch, process.spawn) are left unwired pending a dedicated review. Pure, bundle-safe contracts (src/shared/plugins): - permissions.ts: fixed capability vocabulary, risk tiers, manifest permission parsing (unknown caps rejected, never degraded to allow-all), and isPermitted - default-deny matcher with path-prefix (separator boundary) and host-suffix (dot boundary) scope checks. - rpc-protocol.ts: the frozen host<->sandbox message shapes, the method->capability map, and target extraction (never throws). - signing.ts: signature.json shape, deterministic signing payload (sorted path:hash, excludes the signature file), and trust helpers. - manifest: tier-gated `entry` (relative, traversal-checked) + validated `permissions`; tier 0 forbidden from declaring either. Main process (src/main/plugins): - plugin-signature.ts: ed25519 verify over the canonical payload, with an EXACT file-set check so adding/removing/altering any file invalidates it; resolves unsigned/invalid/untrusted/trusted. - permission-broker.ts: the single authorization gate; re-reads grants per call (live revocation), audits every decision. - plugin-sandbox-entry.ts: child bootstrap that runs plugin code in a vm context with a frozen minimal global (no require/process/Buffer); all effects go through broker-gated RPC. Documents that vm is defense-in- depth, not the primary boundary. - plugin-sandbox-host.ts: forks one utilityProcess per running plugin, validates+size-caps every message, broker-authorizes, executes via injected handlers, graceful-then-hard teardown, crash isolation. - plugin-host-handlers.ts: fs/net(size-capped)/settings(secret-redacted)/ agents.list/notifications.toast implementations; dispatch/spawn injected. - grants store (atomic, validated) + manager integration: verify on discovery, default tier-1 to DISABLED (enable = consent), reconcile sandboxes on enable/disable/uninstall, forget grants on uninstall. UI + IPC: - PluginConsentDialog: lists requested capabilities (risk-colored, scoped, with reason) + signature trust; user approves a subset. Tampered (invalid-signature) plugins cannot be enabled. - plugins:get-grants/set-grants(subset-of-requested only)/revoke-grants IPC + preload + typings; signature badge in the panel. - main/index.ts wires broker + sandbox host + handlers + trusted keys into the manager and tears sandboxes down on quit. 29 new tests (permissions, rpc, signing incl. real ed25519 tamper/forge/ file-add/remove detection, broker isolation + live revocation). 82 plugin tests total.
Addresses the CRITICAL/HIGH findings from the Phase 3 security review:
- CRITICAL: broker path-scope bypass via `..`. normalizePath now collapses
`.`/`..` (leading `..` on absolute paths dropped) so `/scope/../../etc`
no longer prefix-matches `/scope`. (permissions.ts)
- HIGH: symlink/realpath escape after authorization. fs.read/fs.write now
resolve the real path (symlinks + deepest existing ancestor) and
RE-authorize it against the grant before touching disk. (host-handlers)
- HIGH: SSRF via redirect. net.fetch forces redirect:'error' and
allowlists init (method/body/headers) so a 3xx to a non-granted host
(metadata/localhost) cannot be followed. (host-handlers)
- HIGH: vm realm escape surface. The sandbox context no longer receives
host intrinsics (Object/Array/Promise/URL/...) - V8 supplies isolated
context-native ones - and timers are wrapped, closing prototype
pollution of the host and `URL.constructor('return process')` escapes.
Threat-model comment corrected (an escape is NOT broker-firewalled).
- HIGH: signed plugins could ship unsigned symlinks. Signature
verification now FAILS (invalid) on any symlink in the tree, and install
refuses a source containing symlinks.
- MEDIUM/LOW: fs.read size cap; expanded secret-key denylist for
settings.get; unscoped-net consent warning; Windows reserved folder
names rejected; per-plugin RPC rate limit + in-flight cap.
Adds traversal + symlink tests (85 plugin tests). Residual caveats logged
in Plans/plugin-build-review.md.
… (Track B Phase 2) A plugin can declare scheduled triggers (interval or daily clock times) that fire without touching the per-project Cue engine. Fired by a supervised, encore-gated scheduler reusing the Pianola-supervisor lifecycle pattern (managed, self-gating per tick, torn down on quit). - contributions.ts: new `cueTriggers` contribution type + validator (interval everyMinutes >= 1, or dailyTimes as HH:MM; action notify or dispatch; dispatch requires agentId). Invalid times are dropped, bad triggers rejected with a reason. Added to collect/aggregate. - plugin-scheduler.ts (pure): computeDueTriggers - intervals seed on first observation (no startup thundering herd) then fire on elapse; daily times fire at most once per matching clock-minute per day; state for removed triggers is dropped so it cannot grow unbounded. - plugin-scheduler-host.ts (main): poll loop, self-gates on the plugins flag, notify -> toast. dispatch is honored only when an implementation is injected (agents:dispatch is not wired yet); otherwise skipped with a log line, never silently dropped. - main/index.ts: construct + start the scheduler at boot, stop on quit. Deeper Cue-engine integration (file/agent-completion event triggers, prompt-dispatch actions) is deferred and logged in Plans/plugin-build-review.md - the Cue engine is per-project-yaml and flagged complex; global plugin triggers do not fit it without surgery. 17 new tests (scheduler timing + cueTrigger parsing).
Contributor
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
…d ESM)
`maestro plugin init` wrote an ESM entry.js (export function activate / export
default), but the sandbox loads the file raw via `new vm.Script(code)` and reads
`module.exports` - CommonJS, no module loader. `vm.Script` can't parse `export`,
so it threw SyntaxError and every freshly-scaffolded tier-1 plugin silently
failed to activate ("failed to start plugin").
- entryJsTemplate -> `module.exports = { activate, deactivate }` (runtime-compatible),
keeping the JSDoc @import types so it still type-checks without a build step.
- package.json scaffold `type: commonjs`; tsconfig `module`/`moduleResolution`
NodeNext (verified: module.exports type-checks under checkJs with no @types/node
and no node10 deprecation).
- Regression guard: load the scaffolded entry.js exactly as the sandbox does
(`new vm.Script` + bare `module` shim) and assert activate/deactivate bind.
12/12 plugin CLI tests pass; tsc (cli + lint) clean; ESLint + Prettier clean.
The dev guide already documented the CommonJS module.exports contract, but had no top-level authoring walkthrough and showed entry.js examples without the JSDoc @import typing - plus a section-14 ESM `import` that could be misread as the runtime entry shape (which would not parse in the vm.Script sandbox). - Quickstart mirroring `maestro plugin init` exactly: tier-0 vs tier-1 scaffold output (package.json type:commonjs, tsconfig NodeNext), the CommonJS entry.js shape, an explicit "no ESM in entry.js" warning, then validate/sign/pack/install. - Type the canonical entry.js examples (section 6 + quickstart) with the JSDoc @import tag so they are copy-paste-accurate and type-check. - Section 14: clarify SDK types come via a JSDoc @import in the plain-JS entry; the ESM `import` form is only for authoring in TypeScript compiled to CommonJS. Prettier-clean. CLAUDE-PLUGINS.md already delegates authoring here (no change).
…ion (Phase A)
Plan: Plans/plugin-full-surface-plan.md — the full-surface plugin model where
enabling a plugin is a full-trust decision and the security boundary is the
unforgeable authorization gate, not runtime sandboxing of authorized code.
Lands the gate's core (more Phase-A wiring follows):
- AuthorizationStore: ONE profile-wide sealed ledger (Electron safeStorage),
replacing the plain-JSON enable + grants files. Enable and grant are unified —
enabling IS minting an authorization; there is no separate set-enabled path.
- Anti-rollback: a monotonic epoch + per-install secret held in a NAMED OS
credential entry (outside the rollable file tree). A restored old sealed ledger
fails the epoch check -> re-consent. Closes the replay/rollback hole.
- Tombstones: uninstall is authoritative (always bumps epoch + records a
tombstone, even for a never-enabled plugin) so a restored folder can never
silently re-enable.
- Identity binding: a grant binds {contentHash, signatureStatus, signerKey}, so a
post-consent code OR signer/trust change forces re-consent (the content digest
excludes signature.json, so signer identity is tracked explicitly).
- Fail-safe = session-only: missing seal/anchor (keyring-less Linux) or a locked
keyring -> grants live in memory for the session, never silently persisted.
persist() catches seal/anchor/disk failures and degrades safely; no mode
persists authorization without the anchor.
- computePluginContentHash() reuses the signer's canonical file digest.
Dependency-injected (seal/anchor) so rollback, tombstone, tamper, locked-keyring,
and session-only paths are unit-tested. 11 security tests green; tsc (main+lint)
clean; Prettier clean.
Not yet wired (next Phase-A commits): production safeStorage + keyring factory
(adds @napi-rs/keyring, externalized + asarUnpack), the isolated minter IPC
(sender-frame + nonce + user-activation), the refresh-time verifier, and the
separate non-extensible consent window.
…-no-inline-cast-access)
…anifest-subset gate (Phase A)
…stry (minter core, Phase A)
…ity gated aggregation (Phase C)
… palette (Phase C render host)
… (Phase A) ConsentMinter is the only path that creates trust, enforcing the three contract checks the nonce core alone could not: - nonce issued ONLY in the main-owned open path (requestConsent), never a renderer-callable endpoint, and bound to THIS prompt (a superseded prompt's still-live nonce cannot validate against a newer one); - confirm accepted ONLY from the recorded consent frame via a frame-level ConsentSender token (webContents id + frame routing id + url) - a bare webContents id a subframe/navigation could share is not enough; - no renderer-supplied "user activation" flag trusted; trust derives from sender-frame identity + one-time nonce alone. On confirm it consumes the nonce, resolves the plugin's current identity, re-checks the transcripts:read + egress mutual-exclusion at the mint boundary, and mints into the sealed AuthorizationStore. pluginIdentity() maps an installed dir -> AuthIdentity (content digest + signature status + signer key), guarding both the content-hash and the signature read so an unhashable/unreadable plugin returns null (never mintable) instead of crashing the minter/refresh verifier. Pure given injected deps (no Electron) - 28 tests. The IPC layer wires openPrompt to a real consent BrowserWindow/WebContentsView and builds the ConsentSender from event.senderFrame; switching the broker/getGrants to read the ledger is the remaining app-level step.
…bilities Phase B added ui:contribute / ui:panel / ui:render-unsafe to the host capability vocabulary but never bumped the host-API version or re-vendored the standalone @maestro/plugin-sdk, so the drift guard was red. - HOST_API_VERSION 1.3.0 -> 1.4.0 (source + vendored): a backward-compatible capability addition is a MINOR bump per the host-api contract. - Re-vendor the SDK capability surface: the three caps in the PluginCapability union, PLUGIN_CAPABILITIES, CAPABILITY_RISK, CAPABILITY_SCOPE_KIND, and describeCapability - byte-for-byte with src/shared/plugins/permissions.ts. - Vendor the UI-item contribution surface the SDK was also missing: UiSurface, UI_SURFACES, isUiSurface, UiItemContribution, and the uiItems field on PluginContributions + AggregatedContributions. - Extend the drift guard: pin 1.4.0 and add UI_SURFACES parity (the prior guard only covered capabilities/tiers/topics/methods, so the contribution surface could silently fall behind). - Bump @maestro/plugin-sdk 0.1.0 -> 0.2.0 in lockstep with the contract. - Docs: add the ui:* rows to the capability tables in PLUGIN-DEVELOPMENT.md and CLAUDE-PLUGINS.md, the gating note, and the 1.4.0 minHostApi refs. SDK 16 tests green; host-api/permissions/contributions 61 green; tsc clean across SDK + main.
…ibutions The runtime drift guard compares vocabularies/catalogs but can't see a new or changed FIELD on a contribution interface. Add a vitest typecheck pass (drift.test-d.ts via tsconfig.test.json) asserting the vendored UiItemContribution / PluginContributions / AggregatedContributions are type-identical to their host sources, so a future un-vendored field fails tsc. Confirms the full contribution graph is currently in parity.
…sion (Phase A) Two layers wired into the manager: 1. getActiveRecords() now excludes records whose signature is 'invalid'. This is the single authoritative "active" filter (contributions, agents), so tampered code can never contribute on ANY path - refresh, setEnabled, or a future toggle - even though listActive()/the enable toggle do not themselves check the signature. (isRunnable already excluded it for the sandbox; this closes the contribution/agent side.) 2. refresh() consults an optional verifyRecord seam: an enabled, runnable code-tier plugin the injected gate rejects (consented identity no longer matches the bytes on disk, or it was removed) is force-DISABLED during discovery. Only force-disables; absent by default (the enable toggle + consent govern), so current behavior is unchanged until production injects the seam (AuthorizationStore.verify() + pluginIdentity() against the live ledger) - the deferred app-integration step, same pattern as getGrants. 7 tests, incl. a setEnabled-bypass proof for the tamper exclusion; tsc + eslint clean; 175 plugin tests green.
Wires the dedicated, non-extensible consent window and the isolated minter
end to end so the sealed AuthorizationStore is populated through the only
trusted path. CDP-verified in the built app: requesting consent for an
installed plugin opens the consent window, approving a subset mints ONLY
that subset into the ledger, and the window auto-closes.
- src/main/plugins/consent-window.ts: opens a modal, non-resizable, menu-less
BrowserWindow that loads its OWN minimal consent.html with its OWN minimal
consent preload (NOT the main SPA / full preload), hands the offer + one-time
nonce in via additionalArguments only, denies navigation/new windows, and
returns the window's ConsentSender (webContents id + main frame routing id +
url) for confirm-time frame validation.
- src/main/preload/consent.ts + src/main/consent/consent.html: the isolated
surface; exposes ONLY window.pluginConsent { offer, confirm, cancel }.
- scripts/build-preload.mjs: also builds consent-preload.js + copies consent.html.
- index.ts: constructs the AuthorizationStore (safeStorage seal + default
noAnchor -> session-only until the keyring/packaging step) and the
ConsentMinter; registers plugins:request-consent (main-renderer-only, opens
the window), plugins:confirm-consent (validates event.senderFrame against the
recorded consent frame + one-time nonce, then mints + audit-logs the granted
subset), and plugins:cancel-consent. Superseding a prompt closes the prior
window so a stale nonce can't close the new one.
- main preload + global.d.ts: requestConsent bridge for the trusted renderer.
Trust derives from the host-owned window + sender-frame + one-time nonce, not
a renderer-supplied activation flag (which a sandboxed surface could forge).
Reads (broker/manager/getGrants) still use the on-disk store: the read-flip to
make the ledger the live enforcement source is the documented next step, so
set-grants stays the live path until it is replaced.
tsc x3 + ESLint clean; 175 plugin tests green.
…p, Phase A) Completes "Verifier in plugin-manager.refresh": the broker, contribution gating, event bus, transcript-conflict check, scheduler dispatch, and the get-grants IPC now ALL read the sealed AuthorizationStore instead of the forgeable on-disk grants store. The manager's verifyRecord seam is wired to authStore.verify() + pluginIdentity(), so a refresh force-disables an enabled code-tier plugin whose consented identity no longer matches the bytes on disk (tamper) or that was removed. Consent flow is now the only grant path: - plugins:set-grants is REMOVED. A code-tier plugin can only be granted via the host-owned consent window (plugins:request-consent -> ConsentMinter.confirm), which mints the approved subset and then enables + reconciles the plugin. - plugins:set-enabled GATES code-tier activation: enabling a tier>=1 plugin is rejected (PluginNotAuthorized) unless it already holds a ledger grant, so the renderer can never start plugin code without consent. (4 new handler tests.) - revoke-grants drops the sealed grant AND disables the plugin; uninstall tombstones it in the ledger. - PluginsPanel routes code-tier enable through requestConsent; the in-renderer PluginConsentDialog (superseded by the host window) is removed. Removed the dead `registerAllHandlers` plugin block (a divergent local registration; production registers individually in index.ts). Session-only fallback (noAnchor) until the keyring persistent anchor lands - re-consent each launch, acceptable for the flag-gated feature; the keyring anchor is a separate packaging task. Verified end-to-end via CDP in the built app: ungranted tier-1 enable rejected; consent mints fs:read only (not the unchecked notifications:toast) + enables; getGrants reads the ledger; revoke drops the grant + disables. tsc x3 + ESLint clean; 176 plugin tests green.
…gress (Phase D) Reduces the documented plugin-panel self-navigation egress residual (a panel setting window.location to a remote URL to leak data obtained via its granted caps) with two defense-in-depth layers: - Bridge origin-gating: the maestro:invokeCommand handler now requires event.origin === 'null' (the sandboxed srcDoc frame's opaque origin) in addition to the source check. If a panel self-navigates to a real origin the WindowProxy source can survive, but its messages then carry a non-null origin and are rejected, so a navigated frame can no longer reuse the bridge. - Main-process subframe backstop: blocksSubframeNavigation (pure, unit-tested) wired into the main window's will-frame-navigate blocks any subframe navigating away from its initial about:srcdoc/about:blank document (data: included - it would drop the CSP while keeping the opaque origin). All main-window iframes are srcDoc (plugin panels + file-preview); browser tabs are separate guest WebContents, so the guard is scoped without keying off a mutable window.name. HONEST SCOPE: the load-bearing barriers stay the iframe sandbox (no allow-top-navigation), the SPA frame CSP, and the origin-gated bridge. will-frame-navigate did not reliably surface sandboxed-srcdoc self-navigation in CDP probing, so the new guard is a belt-and-suspenders main-process layer, not the sole barrier - the comments say so. 6 new unit tests for blocksSubframeNavigation; tsc x3 + ESLint clean; 331 plugin/renderer tests green.
…rity-deferrals
The two remaining plugin-surface items are deliberate Phase-3 security
deferrals, not implementations. Convert them from prose intentions into
ENFORCED regression invariants so the boundary fails loudly if weakened.
E (agents:dispatch / process:spawn — arbitrary-code-execution grade):
- factory test: buildHostCallHandlers omits both verbs with default deps,
still exposes read-only agents.get, and the deps are the sole gate.
- AST source guard (plugin-host-deps-wiring.test.ts): the live
buildHostCallHandlers({...}) call in index.ts must pass no dispatch/spawn
dep (parses the TS AST, not text), so wiring either fails CI.
D (ui:render-unsafe — highest-risk render escape hatch, still inert):
- gate test: holding only ui:render-unsafe yields uiItems=[]/panels=[], so
the escape-hatch grant can't silently unlock host-rendered surfaces.
No production wiring changed; both verbs and the unsafe render surface stay
inert. tsc x3 + ESLint + Prettier clean; 63 affected tests green.
Two bugs found while testing the plugin permission surface end-to-end: - PluginsPanel never subscribed to the main-process `plugins:changed` broadcast, so after the host-owned consent window minted a grant and enabled a code-tier plugin, the toggle stayed stale (the approval looked like a no-op). Subscribe to window.maestro.plugins.onChanged -> reload, with a regression test (subscribe-and-reload + unsubscribe-on-unmount). - A rejected consent confirm (e.g. the transcripts:read + egress conflict for an untrusted plugin) closed the consent window and returned the rejection to the now-closed window, so it failed silently. Surface the reason as a toast and leave the plugin disabled. tsc (main + renderer) + ESLint + Prettier clean; affected tests green.
…, gated dispatch Builds out three plugin extension planes (Encore-gated, no new capabilities): Observe — new metadata-only event topics agent.exited / agent.error / usage.updated / run.completed, emitted from a dedicated plugin-event-listener bridged off the central ProcessManager listeners (exit/error/usage/query-complete). Scalar payloads only; the bus still sanitizes + re-authorizes every delivery. Render — plugin uiItems now render in the sidebar, activity-bar, and toolbar surfaces (previously only 'menu' via the palette) via a new PluginUiItemsSlot that invokes the plugin's own namespaced command through plugins.invokeCommand. Act — scheduler cue-trigger dispatch is risk-gated by a pure, tested evaluateScheduledDispatch: auto-eligible only when low/medium risk AND the plugin holds agents:dispatch AND is trusted (signed). Eligible triggers are surfaced to the user (notify); a blind auto-send sink is intentionally NOT wired (a static manifest cueTrigger can't safely address a runtime session id), and the raw agents.dispatch / process.spawn host methods stay inert (deps-wiring guard green). HOST_API_VERSION 1.4.0 -> 1.5.0 (backward-compatible); SDK contracts re-vendored + drift guard updated. tsc x3 + ESLint + Prettier clean; 350 affected plugin/ renderer tests + 16 SDK drift tests green. (ui:command real palette invocation deferred — needs an always-on renderer command registry + request/response IPC.)
Add metadata-only `cue.runStarted` / `cue.runFinished` event topics so
plugins (events:subscribe) can observe automation-run lifecycle:
- cue.runStarted: { runId, sessionId, subscriptionName }
- cue.runFinished: { runId, sessionId, subscriptionName, status,
pipelineName?, durationMs? } - emitted from BOTH terminal paths
(onRunCompleted: completed/failed/timeout; onRunStopped: stopped),
so plugins never miss a terminal transition.
Wired via typed CueRunManager/CueEngine deps (a new onRunStarted
callback + the existing onRunCompleted/onRunStopped), never by scraping
onLog payloads. Emits route through a guarded safeEmitPluginEvent so a
throwing plugin bus can never strand a run outside cleanup - onRunStarted
fires before the run manager's try/finally.
HOST_API_VERSION 1.5.0 -> 1.6.0 (backward-compatible topic add). SDK
contracts re-vendored (topics + payloads + version) and the drift guard
re-pinned to 1.6.0.
Verification: tsc x3 (main/lint/cli) clean; ESLint + Prettier clean;
cue-engine + run-manager suites green (188); SDK drift green (16).
Tests assert start+finish on completion, status:'stopped' on manual
stop, and that the run completes even when the plugin sink throws.
Autonomous validation of the tier-1 plugin runtime end-to-end, so plugin
work no longer needs a manual GUI session to verify.
- e2e/fixtures/plugins/maestro-e2e-selftest: versioned self-test plugin
(portable __FS_SCOPE__ / __RUN_ID__ placeholders) that probes a
representative capability set and logs PASS/DENY per capability.
- e2e/fixtures/plugin-harness.ts: launches an ISOLATED Maestro via demo
mode (MAESTRO_DEMO_DIR -> app.setPath('userData')), seeds the plugins
Encore flag + enabled state + the fixture (fs scope deliberately OUTSIDE
userData, which the broker structurally refuses), and captures the
main-process stdout where the sandbox's forwarded console.log lands. A
per-run id marker prevents stale-log false-passes (getLogsDir is not
demo-redirected, so the on-disk log is neither isolated nor reliable).
- e2e/plugins.spec.ts:
* discovery + default-deny: an enabled-but-ungranted plugin is
discovered, its real utilityProcess sandbox spawns and activates, and
every brokered capability is DENY.
* selective consent: drives the host-owned consent window, approves a
subset (withholds net:fetch + ui:command), and asserts approved->PASS
/ withheld->DENY against live per-call grant enforcement.
Run: bunx playwright test e2e/plugins.spec.ts
(build dist first: bun run build:main && bun run build:renderer)
Both specs pass locally (ESLint + Prettier clean).
Expand the plugin e2e from the security chain to the full interface, all against a real isolated Maestro (demo mode) + real utilityProcess sandbox: - Full broker matrix: the fixture now probes all 15 callable capabilities and classifies each PASS / DENY / INERT. Granting via the real consent window proves approved caps function, host-unwired verbs (agents:dispatch, process:spawn, ui:command) are INERT, net:fetch is broker-allowed, and a withheld cap stays DENY. - Event delivery: subscribe -> fire a real host session.updated (history-dir watcher) -> assert the sandbox handler received it. Proves the event bus actually delivers, not just that events.subscribe succeeds. - Contributions: declares every tier-0 bucket (themes, prompts, settings, commandMacros, cueTriggers, agents, tools, keybindings) plus commands / uiItems / panels. Asserts tier-0 aggregate for an enabled plugin, that ui:contribute / ui:panel gate uiItems / panels, and that revokeGrants clears them and re-denies the broker. - Trust + conflict: an untrusted plugin approving transcripts:read together with egress (net:fetch / process:spawn) is rejected as a consent conflict (whole mint fails -> nothing granted); a freshly ed25519-signed + trusted plugin lifts the conflict and transcripts:read functions. Harness additions: ed25519 signing mirroring the host's frozen contract + trusted-key seeding, a reusable consent-window driver, and a session.updated trigger. Fixture self-test stays portable (__FS_SCOPE__/__RUN_ID__). Run: bunx playwright test e2e/plugins.spec.ts (build dist first: bun run build:main && bun run build:renderer) All 5 specs pass locally; ESLint + Prettier clean.
Workflowz analysis (6 parallel agents) of the plugin extension surface and the 5 Encore features. Finding: directorNotes is hard and usageStats / symphony / maestroCue / pianola are infeasible-now as plugins, all blocked on the same 14 platform gaps (OS sandbox, inert agents:dispatch/process:spawn, ui:command stub, persistent background services, richer host-API act verbs, UI hosts, registry, etc.). Plans/plugin-platform-and-encore-uplift.md turns those gaps into layered, parallelizable workstreams (P0 contracts -> P1 foundations -> P2 security core -> P3 host-API breadth -> P4 feature lifts), each with an acceptance test that extends the e2e plugin harness; designs the "Encore Features" marketplace UI (tiled grid + category filters + details install/uninstall/configure + only-installed toggle), which is buildable now; and lays out the worktree decomposition + merge strategy. WS-os-sandbox is flagged as a native tech decision requiring approval (no Rust baked in).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Beta / visibility draft
Opening as a draft for visibility — the autonomous-manager-agent feature is behind the off-by-default
plugins/pianolaEncore flags. The plugin authorization gate + UI surface is now complete and verified (see Update below), but the PR stays a draft pending team review and a rebase ontorc(currently merge-DIRTY); it is not proposed for immediate merge. Sharing the full diff + the audit so the team can see the shape and weigh in.Update — authorization gate + full UI surface + security locks (since the description above)
Since this description was written, the plugin authorization spine and UI capability surface landed and were verified, and the two highest-risk capabilities are now locked as regression-guarded deferrals:
getGrantsreads the sealed ledger; revoke drops + disables.will-frame-navigatesubframe backstop + unit tests) — it was a follow-up below, now resolved.agents:dispatch/process:spawn(arbitrary-code-execution grade): the handler factory omits both verbs, and an AST source-guard asserts the livebuildHostCallHandlers({…})call inindex.tspasses nodispatch/spawndep — wiring either fails CI.ui:render-unsafe(highest-risk render escape hatch): declared, consent-gated, renders nothing; a gate test proves holding only it still yieldsuiItems=[]/panels=[], so it can't silently unlock host-rendered surfaces.tsc ×3+ ESLint + Prettier clean; affected suites green.Update 2 — extension surface: lifecycle events, uiItem surfaces, gated dispatch (+ UI-sync fixes)
Extending what plugins can observe / render / act on. Still Encore-gated, no new capabilities, no OS sandbox (enabling a code plugin remains a full-trust act).
agent.exited,agent.error,usage.updated,run.completed, emitted from a dedicated plugin-event listener bridged off the central ProcessManager listeners. Scalar payloads only (a test asserts no message/raw/secret leakage); the bus still sanitizes + re-authorizes every delivery against live grants.sidebar/activity-bar/toolbar(previously only themenu/palette surface), via a newPluginUiItemsSlotthat invokes the plugin's own namespaced command.evaluateScheduledDispatch: auto-eligible only when low/medium risk AND the plugin holdsagents:dispatchAND is trusted (signed). Eligible triggers are surfaced to the user (notify) — a blind auto-send sink is intentionally NOT wired (a static manifest cueTrigger can't safely address a runtime session id).ui:command(real palette invocation) and the rawagents.dispatch/process.spawnhost methods remain deferred / inert (the deps-wiring guard stays green).plugins:changed(the enable toggle was stale after approving consent); a rejected consent confirm now surfaces a toast instead of failing silently.HOST_API 1.4.0 → 1.5.0 (backward-compatible); SDK contracts re-vendored + drift guard updated.
tsc ×3+ ESLint + Prettier clean; 350 affected plugin/renderer tests + 16 SDK drift tests green.What's in it (all Encore-gated, off by default)
utilityProcesssandbox, net-egress SSRF guard, an ed25519 signing CLI +@maestro/plugin-sdkauthoring package.toolsto a spawned agent's model over MCP (maestro-cli mcp serve); every model-initiated call is risk-gated before the broker runs it.~220 files vs
rc.Audit (this PR)
Greptile-5/5-style audit over the full diff: 4 reviewer subagents (MCP, plugin system, Pianola, integration/SDK) + 2 GPT-5.5
codexread-only passes (MCP bridge, plugin security core). Triaged every finding; fixed the real ones:Critical / High
--mcp-configbefore the script path and breaking the launch.tools/callcan no longer reach an arbitrary plugin command handler: the bridge rejects unmapped names and the app validates the toolId is a declared tool.plugins:set-grantsnow enforces thetranscripts:read+ egress mutual-exclusion at the consent boundary (was UI/runtime only — a direct IPC call could persist the conflicting grant).Correctness
deploy/publish).try/catch, decision-log per-record byte cap.getPanelHtmlrealpath containment; drop--strict-mcp-config; forward data-dir env to the bridge;-32700on malformed MCP frames; SDK SPDX license.+28 tests.
tsc×3 clean · ESLint + Prettier clean · SDK drift intact · affected suites green.MCP auto-injection status (honest)
verified:false, not auto-injected): opencode (itsOPENCODE_CONFIGvsOPENCODE_CONFIG_CONTENTmerge needs a live check or it could re-enable prompt-hangs); best-guess adapters for gemini/qwen/copilot/droid/hermes/pi.Known follow-ups (not blocking the draft)
vmrealm-escape — the accepted Phase-3 OS-sandbox decision.